Because I keep saying I will remember how do do things and then I don’t so I’m putting the bookmarked links and comments into one place to try and help me spend less time searching for answers.

1 Correlation

1.1 base R correlation: cor()

cor(mtcars[,1:4])
##             mpg        cyl       disp         hp
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475
## disp -0.8475514  0.9020329  1.0000000  0.7909486
## hp   -0.7761684  0.8324475  0.7909486  1.0000000

If data has NA’s in any of the values the cor() will results in NA. If you want to remove the NA’s when calculating correlation do:

cor(..., use = "complete.obs")

Source: https://stackoverflow.com/questions/3798998/cor-shows-only-na-or-1-for-correlations-why

1.2 Dot Plots for Multiple Variables: pairs()

That chart that plots all variables against eachother as a dot plot when looking to see if variables are correlated with eachother

#lots of variables so only look at first 4
testdf <- mtcars[,1:4]
pairs(testdf, main = "title")

2 CSV

2.1 Read (Import)

2.1.1 base R read.csv()

read.csv("example.csv")
##   ï..Name.1 Name..2. Name..3
## 1       ch1        1      10
## 2       ch2        2      12
## 3       ch3        3      13
## 4       ch4       NA      14
## 5       ch5        5      15
## 6                  6      16
## 7       ch7        7      17
  • output is data frame (base R)
  • missing for character is blank
  • missing for numeric is ‘NA’ (not colored)
  • spaces or special characters in headers are replaced with period
  • ï.. added to beginning of first column name (to remove use read.csv("example.csv", fileEncoding = 'UTF-8-BOM'); Source)
  • remove headers with header = FALSE; column names will be V1, V2, V3, etc.

2.1.2 readr read_csv()

library(readr) #included in tidyverse
read_csv("example.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   `Name 1` = col_character(),
##   `Name (2)` = col_double(),
##   `Name #3` = col_double()
## )
## # A tibble: 7 x 3
##   `Name 1` `Name (2)` `Name #3`
##   <chr>         <dbl>     <dbl>
## 1 ch1               1        10
## 2 ch2               2        12
## 3 ch3               3        13
## 4 ch4              NA        14
## 5 ch5               5        15
## 6 <NA>              6        16
## 7 ch7               7        17
  • output is tibble (tidyverse)
  • receive message with type of columns that R is using for import
  • missing for character and numeric is NA (colored red) in R console; outputs (html) will show <NA> for missing character and NA for missing numeric
  • headers are that have spaces or special characters are placed within backticks (``)
  • remove headers with col_names = FALSE; column names will be X1, X2, X3, etc.

2.2 Specify Column Types

R does a pretty good job of figuring out what the columns should be but if its needed to specify column types (or you want don’t want the default col_types message to show) column types can be specified:

read_csv("example.csv"
         , col_types = cols(
            `Name 1`   = col_character()
          , `Name (2)` = col_double()
          , `Name #3`  = col_double()
           )
)
## # A tibble: 7 x 3
##   `Name 1` `Name (2)` `Name #3`
##   <chr>         <dbl>     <dbl>
## 1 ch1               1        10
## 2 ch2               2        12
## 3 ch3               3        13
## 4 ch4              NA        14
## 5 ch5               5        15
## 6 <NA>              6        16
## 7 ch7               7        17

2.3 Write (Export)

Can use write_csv() or write.csv() - have sightly different functionality.

test <- read_csv("example.csv"
         , col_types = cols(
            `Name 1`   = col_character()
          , `Name (2)` = col_double()
          , `Name #3`  = col_double()
           )
)
write_csv(test, "example_export1.csv")
write.csv(test, "example_export2.csv")

Rownames; row.names = TRUE to include; row.names = FALSE to exclude

  • write.csv() default includes row names (usually row number)
  • write_csv() default does not include row names; CANNOT ADD

 

NA values; na = "" to have missing data be exported as blank cell

  • If data set is a base R data frame:
    • write.csv() default is na = "NA" for numeric, always blank for character (can’t change!)
    • write_csv() default is na = "NA" for numeric, always blank for character (can’t change!)
  • If data set is a tibble:
    • write.csv() default is na = "NA" for numeric and character
    • write_csv() default is na = "NA" for numeric and character

3 Dates and Times

3.1 Date Formats

String Meaning Code Output
%a Day of the week, abbreviated (Mon-Sun) format.Date(“2020-12-10”, “%a”) Thu
%A Day of the week, full (Monday-Sunday format.Date(“2020-12-10”, “%A”) Thursday
%w Day of the week, numeric, 0 = Sunday (0-6) format.Date(“2020-12-10”, “%w”) 4
%e Day of month (1-31) format.Date(“2020-12-10”, “%e”) 10
%d Day of month (01-31) format.Date(“2020-12-10”, “%d”) 10
%m Month, numeric (01-12) format.Date(“2020-12-10”, “%m”) 12
%b Month, abbreviated (Jan-Dec) format.Date(“2020-12-10”, “%b”) Dec
%B Month, full (January-December) format.Date(“2020-12-10”, “%B”) December
%y Year, without century (00-99) format.Date(“2020-12-10”, “%y”) 20
%Y Year, with century (0000-9999) format.Date(“2020-12-10”, “%Y”) 2020
%j Day of the Year (001-366) format.Date(“2020-12-10”, “%j”) 345
%U Week of year, numeric, starting on Sunday (00-52) format.Date(“2020-12-10”, “%U”) 49
%W Week of year, numeric, starting on Monday (00-52) format.Date(“2020-12-10”, “%W”) 49
%x Locale-specific date format.Date(“2020-12-10”, “%x”) 12/10/2020

3.2 Time Formats

String Meaning Code Output
%S Second (00-59) format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%S”) 10
%M Minute (00-59) format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%M”) 30
%l Hour, in 12-hour clock (1-12) format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%l”) 3
%I Hour, in 12-hour clock (01-12) format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%I”) 03
%p am/pm format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%p”) PM
%H Hour, in 24-hour clock (00-23) format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%H”) 15
%X Locale-specific time format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%X”) 3:30:10 PM
%c Locale-specific date and time format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%c”) Thu Dec 10 15:30:10 2020
%z Offset from GMT format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%z”) -0600
%Z Time zone (character) format.Date(as.POSIXct(“2020-12-10 15:30:10”, tz = “America/Chicago”), “%Z”) CST

3.3 Time Zone options in R

The above example uses Central time and so I can use tz = America/Chicago; other timezone options can be found using the code below:

#check system time zone
Sys.timezone(location = TRUE)
## [1] "America/Los_Angeles"
#other time zone options (only show first 20)
OlsonNames()[1:20]
##  [1] "Africa/Abidjan"       "Africa/Accra"         "Africa/Addis_Ababa"  
##  [4] "Africa/Algiers"       "Africa/Asmara"        "Africa/Asmera"       
##  [7] "Africa/Bamako"        "Africa/Bangui"        "Africa/Banjul"       
## [10] "Africa/Bissau"        "Africa/Blantyre"      "Africa/Brazzaville"  
## [13] "Africa/Bujumbura"     "Africa/Cairo"         "Africa/Casablanca"   
## [16] "Africa/Ceuta"         "Africa/Conakry"       "Africa/Dakar"        
## [19] "Africa/Dar_es_Salaam" "Africa/Djibouti"

4 ggplot

# load-packages
library(tidyverse)

4.2 hjust and vjust

Why do I always forget the direction of these?

hjust: 0 = left-aligned, 0.5=center, 1 = right-aligned
vjust: 0 = top-aligned, 0.5=middle, 1 = bottom-aligned

Visual Example - R-bloggers

4.3 Math Expressions in labels

4.3.1 Use quote()

ggplot(mpg, aes(displ, hwy))+geom_point()+
  ggtitle(
    quote(
      alpha ^ 2 - frac(1, 10) + sum(n[i], i==1, N)
                )
    )

4.3.2 Use TeX() from the latex2exp package

  • must be in a string
  • must be denoted as math mode with dollar signs
  • must include 2 backslashes for \(\LaTeX\) commands
library(latex2exp)
ggplot(mpg, aes(displ, hwy))+geom_point()+
  ggtitle(TeX(
    "$\\alpha^2 - \\frac{1}{10} + \\sum_{i}^N n_i$"
                )
    )

4.5 Line up axes on stacked plots

Sometimes I’m working on two different types of plots (like a bar chart and a scatter plot) that happen to have the same x-axis. I want to line up these axes so that when the plots are stacked the values correspond to the same date.

4.5.1 gridExtra::grid.arrange() and cowplot::plot_grid()

# two different bar charts
A <- ggplot(mpg, aes(class))+geom_bar()+coord_flip()+ylim(0, 109)
B <- ggplot(mpg, aes(drv))+geom_bar()+coord_flip()+ylim(0, 109)

Using grid.arrange command from the gridExtra package does not line up axes.

#axes don't line up
gridExtra::grid.arrange(A, B, ncol=1)

Use grid.draw command from the grid package.
Source

#make plots into Grobs (grid graphical object)
gA <- ggplotGrob(A) 
gB <- ggplotGrob(B)
grid::grid.draw(rbind(gA, gB))

The cowplot::plot_grid() function allows you to line up plots by a specific axis.

cowplot::plot_grid(A, B, ncol = 1, align = "v")

4.5.2 Facets

Another option is facet_wrap() or facet_grid(), which can works if the axes are the same for the different variables you want to compare, but be careful as facets are supposed to be comparing items with the same measurements.

tidy.df <- pivot_longer(mpg, c(class, drv), names_to = "category", values_to = "type")

ggplot(tidy.df, aes(type))+
  geom_bar()+
  coord_flip()+ 
  facet_wrap(
      ~category
    ,  ncol = 1
    , scales = "free" #removes types from the axis if that category has 0 cars of that type 
  )

ggplot(tidy.df, aes(type))+
  geom_bar()+
  coord_flip()+ 
  facet_grid(
      category ~ . 
    , scales = "free" #removes types from the axis if that category has 0 cars of that type 
    , space  = "free" #spaces based on number of obs (i.e. number of bars);
                      #         rather than giving each facet equal sizing
  )

4.5.3 Mixed Geoms (Bar + Scatter)

Scatter plots and bar charts will not line up automatically, even when using the grid.draw command detailed above. This is because their default limits are different given that the bar chart is centered on the value and the scatter plot is a single point on the value.

#work with smaller subset of data from economics, part of ggplot2 package 
startdate <- "2014-06-01"
economics_small <- economics %>%
  filter(date >= as.Date(startdate)) %>%
  arrange(date)
A <- ggplot(economics_small, aes(date, unemploy))+
  geom_bar(stat="identity")+
  geom_vline(xintercept = as.Date(startdate), color="red", size=2)

B <- ggplot(economics_small, aes(date, uempmed))+
  geom_point()+geom_line()+
  geom_vline(xintercept = as.Date(startdate), color="red", size=2)

gA <- ggplotGrob(A) 
gB <- ggplotGrob(B)
grid::grid.draw(rbind(gA, gB)) #cowplot::plot_grid(A, B, ncol = 1, align = "v") produces same result 

In order to line the up there a a couple of options.

4.5.3.1 Fix xlim for all charts

If you make the limit the first x-value, the bar chart will not show up (remember it’s centered over the value).

A <- ggplot(economics_small, aes(date, unemploy))+
  geom_bar(stat="identity")+
  geom_vline(xintercept = as.Date(startdate), color="red", size=2)+
  xlim(as.Date(startdate), NA)

B <- ggplot(economics_small, aes(date, uempmed))+
  geom_point()+geom_line()+
  geom_vline(xintercept = as.Date(startdate), color="red", size=2)+
  xlim(as.Date(startdate), NA)

gA <- ggplotGrob(A) 
## Warning: Removed 1 rows containing missing values (geom_bar).
gB <- ggplotGrob(B)
grid::grid.draw(rbind(gA, gB))

This can be fixed by adding a half unit to the x-axis (i.e. having the lower limit be half-unit lower than smallest x-value). In this case the unit is a month, so a half-unit would be ~15 days.

HalfUnit <- .5*(economics_small$date[2] - economics_small$date[1])
HalfUnit
## Time difference of 15 days
A <- ggplot(economics_small, aes(date, unemploy))+
  geom_bar(stat="identity")+
  geom_vline(xintercept = as.Date(startdate), color="red", size=2)+
  xlim(as.Date(startdate)-HalfUnit, NA)

B <- ggplot(economics_small, aes(date, uempmed))+
  geom_point()+geom_line()+
  geom_vline(xintercept = as.Date(startdate), color="red", size=2)+
  xlim(as.Date(startdate)-HalfUnit, NA)

gA <- ggplotGrob(A) 
gB <- ggplotGrob(B)
grid::grid.draw(rbind(gA, gB))

4.5.3.2 Shift Bar chart to right

Bar charts are automatically centered over the x-value. Bar charts (and any geom object) can be shifted by using position - position_nudge()). The shift needs to be half a unit on the x-axis, again here it is monthly data so a half unit would be ~15 days.
Source

A <- ggplot(economics_small, aes(date, unemploy))+
  geom_bar(stat="identity", position = position_nudge(x = as.vector(HalfUnit)))+
  geom_vline(xintercept = as.Date(startdate), color="red", size=2)

B <- ggplot(economics_small, aes(date, uempmed))+
  geom_point()+geom_line()+
  geom_vline(xintercept = as.Date(startdate), color="red", size=2)

gA <- ggplotGrob(A) 
gB <- ggplotGrob(B)

grid::grid.draw(rbind(gA, gB))

5 if else

test expression goes in parenthesis () and the statment goes in the curly brakets {}

5.1 If

if (test) { statment } 

5.2 If else

R is a bit finicky with where the brakets go; I get errors when I put else on a new line by itself - it wants to have the right braket before it; } else

if (test) {
statment #1
} else {
statment #2 
}

5.3 If elseif elseif … else

if (test) {
statment #1
} elseif {
statment #2 
} elseif {
statment #3 
} else {
statment #4 
}

5.4 ifelse()

Automatically works for vectors so this is preferred if making adjustments to data set variables.

ifelse(condition, value if true, value if false)

5.5 case_when() (instead of nested ifelse() statements)

RDocumentation: case_when()

case_when(
    x == val1 ~ output1
  , x == val2 ~ output2
  , x == val3 ~ output3
  #if x doesn't fit into above values can set a catch-all output
  #if catch-all output is not defined; output will be NA for x's that don't meet conditons
  , TRUE      ~ everythingelse 
)

6 Images

6.1 Markdown

![this is Daffodil and Blossom](img/Peegs.jpg) this is Daffodil and Blossom

Re sizing images in Markdown is required if you are knitting to a pdf - because you can’t use HTML code.

Tip and tricks for workign wtih images and figures in R Markdown documents - hollie@zevross.com

Adjust the out.width and out.height in the R chunk options

{r, out.width="50%"}  
img <- "img/Peegs.jpg" #path to image  
knitr::include_graphics(img) #in the knitr package   

6.2 HTML in markdown

In my opinion, HTML is a lot easier to use for images options.

HTML Images

<img src="img/Peegs.jpg" alt="this is Daffodil and Blossom" width="50%">
this is Daffodil and Blossom

7 Select/Remove Based on Pattern

7.1 Select data frames based on pattern

thispatternexample <- "example1"
patternexample     <- "example2"
thispattern        <- "example3"

mget(ls(pattern = "pattern"))
## $patternexample
## [1] "example2"
## 
## $thispattern
## [1] "example3"
## 
## $thispatternexample
## [1] "example1"
mget(ls(pattern = "^pattern"))
## $patternexample
## [1] "example2"
mget(ls(pattern="^prefix\\..*")) # ^ = begins with, \\.. = . (such as df.suffix)

# can use rbind or bind_rows to combine all df's (if they are the same variables) 
bind_rows(mget(ls(pattern="^prefix\\..*")))

7.2 Remove multiple items by pattern

rm(list = ls(pattern = "pattern"))
rm(list = ls(pattern = "^prefix\\.")) #don't need double '.' like for mget() 

8 Versions

8.1 Install Older Package from Source

R Studio Support - Installing older versions of packages

Package Archive

packageurl <- "https://cran.r-project.org/src/contrib/Archive/<package>/<package>_<version>.tar.gz"
install.packages(packageurl, repos=NULL, type="source")

8.2 Install Older Version of R (Windows)

If a program was built entirely in an older version of R, it may be difficult to get it to work with an updated version of R. When their isn’t time to investitgate and re-code, installing an older version of R is possible.

  1. cran.r-project.org
  2. Click the ‘Download for Windows’ link for your given operating system
  3. Click on the ‘base’ link
  4. Click on the ‘Previous releases’ link
  5. Click on the specific R version you want, say R 3.5.3
  6. Click on the ‘Download R for Windows’ and this will download an exe file
  7. Run the exe file to install the older version of R

8.3 Switch between versions of R

  1. Open R Studio
  2. Tools -> Global Options…
  3. General -> R Version -> Change…